Logo

Research objectives

Question 1

What are the contributions of this study ?

Write a brief summary (no more than 1.5 pages) that discusses the contributions of this study. You might wish to consider the following questions: What are the objectives, and how do they relate to the established literature? What is novel about the empirical strategy? What are the results and how convincing are these?

Answer 1: This paper studies how changes that make unemployment systems more generous affect the duration of unemployment. A considerable theoretical literature has shown that a more generous unemployment insurance system (through a higher RR and/or a longer PBD) will reduce the optimal job search effort of an unemployed worker and hence result in longer unemployment duration. That is also confirmed by the predictions of job search theory, which states, among other things, that as the duration of unemployment increases, individuals may lower their reservation wage due to financial pressures and the desire to avoid long-term unemployment. If these pressures are relieved by increasing the RR and/or the PBD, it is understandable that the duration of unemployment may increase. About the job search theory there have been many studies in the last decade.

In this study, as I anticipated, Lalive, van Ours, and Zweimüller investigate how changes in financial incentives, specifically the benefit replacement rate (RR) and potential benefit duration (PBD) of unemployment insurance, impact the duration of unemployment. The objectives of the study are to understand the effects of these key parameters on unemployment duration and to provide insights for policymakers on designing effective unemployment insurance policies.

The study contributes to the existing literature by focusing on the interplay between RR and PBD, which is a relatively understudied area. While previous research has examined the individual effects of RR or PBD changes on unemployment behavior, this study uniquely explores the combined impact of these parameters. By analyzing data from Austria, the authors are able to leverage a natural experiment that occurred in the late 1980s, providing a valuable opportunity to assess the effects of simultaneous changes in RR and PBD.

One of the key novelties of the empirical strategy is the use of longitudinal individual data from Austrian social security databases, allowing for a detailed analysis of unemployment spells and transitions. The study carefully constructs groups of job seekers based on income, age, and work experience criteria to isolate the effects of RR and PBD changes. This meticulous approach enhances the credibility of the results and strengthens the internal validity of the study.

The results of the study reveal several important findings. The authors find that increases in PBD have a more substantial impact on unemployment duration compared to changes in RR. Older workers exhibit stronger reactions to PBD extensions, highlighting the importance of considering age-related differences in policy design. The study also demonstrates that the combined effect of simultaneous changes in RR and PBD differs from the sum of individual effects, suggesting potential interaction effects between these parameters.

Overall, the results of the study are robust and provide valuable insights for policymakers. The findings underscore the significance of PBD in influencing unemployment behavior and suggest that adjusting the potential duration of benefits may be a more effective policy tool than altering benefit levels. By shedding light on the relative importance of RR and PBD in shaping unemployment outcomes, this study offers practical implications for designing efficient unemployment insurance systems.

Data Preparation

Loading data

We begin with some preliminary steps, loading the libraries and the data

# Libraries loading
library(foreign)
library(survival)
library(tidyverse)
library(dplyr)
library(survminer)
library(gt)

Then we check for the dimension of our dataset and for the presence of NA’s.

# Data loading

udat <- read.dta("fi.dta")  
udat <- udat[,1:134] ## get rid of some superfluous variables
udat <- as_tibble(udat)

dim(udat)
## [1] 225821    134
sum(is.na(udat))
## [1] 0
# Data Visualization

glimpse(udat[,1:36])
## Rows: 225,821
## Columns: 36
## $ beginn   <dbl> 10412, 10169, 10624, 10343, 10654, 10101, 10652, 10338, 10705…
## $ ein_zus  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0…
## $ sfrau    <dbl> 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0…
## $ age      <dbl> 51.13758, 54.71595, 54.04791, 54.11636, 46.13005, 42.11909, 5…
## $ after    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ dur      <dbl> 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428…
## $ nwage_pj <dbl> 11282.729, 8577.623, 6828.051, 10071.884, 6342.333, 6094.196,…
## $ e3_5     <dbl> 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ bdur     <dbl> 30, 30, 30, 20, 30, 30, 30, 20, 30, 30, 30, 30, 30, 30, 30, 3…
## $ y1988    <dbl> 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1…
## $ y1989    <dbl> 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0…
## $ y1990    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ y1991    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ med_educ <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ hi_educ  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ lehre    <dbl> 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0…
## $ married  <dbl> 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1…
## $ single   <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ divorced <dbl> 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…
## $ f_marr   <dbl> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0…
## $ f_single <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ f_divor  <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ bc       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1…
## $ pnon_10  <dbl> 0.2651971579, 0.1003559679, 0.1081599146, 0.3736308813, 0.033…
## $ q2       <dbl> 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1…
## $ q3       <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0…
## $ q4       <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0…
## $ seasonal <dbl> 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…
## $ manuf    <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0…
## $ ten72    <dbl> -3.47749186, -1.04384339, 0.23723496, -2.63423371, -2.5197806…
## $ type     <fct> PBD and RR, PBD and RR, PBD and RR, PBD and RR, PBD and RR, P…
## $ uncc     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ lwage    <dbl> 6.173206, 5.918063, 5.645761, 6.059680, 5.571968, 5.576242, 5…
## $ tr       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ t39      <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0…
## $ t52      <dbl> 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1…
#head(data)
#summary(data)

This dataset contains many attributes, some of the most important are:

  • type : factor variable that defines the groups
  • dur: the duration of unemployment spell (weeks)
  • uncc: variable that assumes the value 1 if spell not censored, 0 otherwise
  • after: variable that assumes the value 1 if spell starts after Aug 1, 1989

Here you can find a more in-depth list of the variables used and what they represent:

List of variables

table(udat$type)
## 
## PBD and RR        PBD         RR    control 
##      21174      99404      32470      72773
summary(udat$dur)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0712   6.8387  11.6826  18.5494  19.9456 644.0881
## Computation of average spells when durations are truncated at 104 weeks
udat %>%
  mutate(dur104 = dur,
         dur104 = ifelse(dur104 > 104, 104, dur104)) ->
  udat

Difference-in-Differences

Question 2

(difference-in-differences) Attempt to replicate Table 4 of the paper:

Table 4 from the paper
Table 4 from the paper

Answer 2:

BASIC STEP BY STEP:

basic = udat %>%
  group_by(type, after) %>%
  summarize(mean = round(mean(dur104),2),
            n = n(),
            se =  round((sd(dur104) / sqrt(n)),2))
basic
## # A tibble: 8 × 5
## # Groups:   type [4]
##   type       after  mean     n    se
##   <fct>      <dbl> <dbl> <int> <dbl>
## 1 PBD and RR     0  18.5 11992  0.16
## 2 PBD and RR     1  22.7  9182  0.23
## 3 PBD            0  15.8 48294  0.08
## 4 PBD            1  18.1 51110  0.09
## 5 RR             0  17.1 17160  0.12
## 6 RR             1  19.1 15310  0.15
## 7 control        0  14.5 33815  0.08
## 8 control        1  15.6 38958  0.09

PIVOTING:

diff_in_diff <- udat %>%
  group_by(type, after) %>%
  summarize(mean_outcome = mean(dur104),
            n_obs = n(),
            se_outcome =  sd(dur104) / sqrt(n_obs)) %>%
  pivot_wider(names_from = after, values_from = c(mean_outcome, se_outcome, n_obs)) %>%
  mutate(change_mean = `mean_outcome_1` - `mean_outcome_0`, change_se = `se_outcome_1` - `se_outcome_0`)

diff_in_diff = diff_in_diff %>%
  mutate(DiD_mean = change_mean - as.double(diff_in_diff[4,8]))
#print(diff_in_diff)
diff_in_diff1 <- data.frame(
  Group = levels(udat$type),
  before_mean = diff_in_diff$mean_outcome_0,
  before_se = diff_in_diff$se_outcome_0,
  before_n = diff_in_diff$n_obs_0,
  after_mean = diff_in_diff$mean_outcome_1,
  after_se = diff_in_diff$se_outcome_1,
  after_n = diff_in_diff$n_obs_1,
  change_mean= diff_in_diff$change_mean,
  change_se= diff_in_diff$change_se,
  DiD_mean= diff_in_diff$DiD_mean
)

knitr::kable(diff_in_diff1, align = "c", digits = 2, caption = "My Table for DiD")
My Table for DiD
Group before_mean before_se before_n after_mean after_se after_n change_mean change_se DiD_mean
PBD and RR 18.49 0.16 11992 22.74 0.23 9182 4.25 0.07 3.08
PBD 15.83 0.08 48294 18.08 0.09 51110 2.25 0.02 1.08
RR 17.11 0.12 17160 19.10 0.15 15310 1.99 0.03 0.82
control 14.46 0.08 33815 15.63 0.09 38958 1.17 0.01 0.00

REGRESSION:

levels(udat$type)
## [1] "PBD and RR" "PBD"        "RR"         "control"
udat$type <- relevel(udat$type, ref = "control")
levels(udat$type)
## [1] "control"    "PBD and RR" "PBD"        "RR"
lm1=lm(dur104~type+after, data=udat) 
summary(lm1)
## 
## Call:
## lm(formula = dur104 ~ type + after, data = udat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.421  -9.953  -4.967   2.869  90.009 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    13.99138    0.07722  181.19   <2e-16 ***
## typePBD and RR  5.45200    0.13913   39.19   <2e-16 ***
## typePBD         1.94545    0.08681   22.41   <2e-16 ***
## typeRR          3.08791    0.11883   25.98   <2e-16 ***
## after           2.04901    0.07503   27.31   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.79 on 225816 degrees of freedom
## Multiple R-squared:  0.01062,    Adjusted R-squared:  0.0106 
## F-statistic: 605.7 on 4 and 225816 DF,  p-value: < 2.2e-16
coef(lm1)[3:4] + coef(lm1)[1] 
##  typePBD   typeRR 
## 15.93683 17.07929
coef(lm1)[3:4] + coef(lm1)[1] + coef(lm1)["after"]
##  typePBD   typeRR 
## 17.98584 19.12830

A small note regarding this point: in the solutions proposed by the teacher the same result comes out as this model, which I imagine is the model used. However, perhaps it would be more accurate to add an interaction term between type and after: in this case the model would look like this:

lm2=lm(dur104~type*after, data=udat) 
summary(lm2)
## 
## Call:
## lm(formula = dur104 ~ type * after, data = udat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.669  -9.933  -4.864   2.860  89.538 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          14.46226    0.09673 149.518  < 2e-16 ***
## typePBD and RR        4.02607    0.18904  21.297  < 2e-16 ***
## typePBD               1.37176    0.12612  10.876  < 2e-16 ***
## typeRR                2.64590    0.16671  15.871  < 2e-16 ***
## after                 1.16942    0.13220   8.846  < 2e-16 ***
## typePBD and RR:after  3.08199    0.27985  11.013  < 2e-16 ***
## typePBD:after         1.07954    0.17383   6.210  5.3e-10 ***
## typeRR:after          0.81838    0.23786   3.441  0.00058 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.79 on 225813 degrees of freedom
## Multiple R-squared:  0.01117,    Adjusted R-squared:  0.01114 
## F-statistic: 364.5 on 7 and 225813 DF,  p-value: < 2.2e-16

Another way to obtain the coefficient would be to suppress the constant of the linear model:

lm2=lm(dur104~type+after- 1, data=udat) 
summary(lm2)
## 
## Call:
## lm(formula = dur104 ~ type + after - 1, data = udat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.421  -9.953  -4.967   2.869  90.009 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## typecontrol    13.99138    0.07722  181.19   <2e-16 ***
## typePBD and RR 19.44338    0.12652  153.67   <2e-16 ***
## typePBD        15.93683    0.06836  233.14   <2e-16 ***
## typeRR         17.07929    0.10488  162.84   <2e-16 ***
## after           2.04901    0.07503   27.31   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.79 on 225816 degrees of freedom
## Multiple R-squared:  0.4756, Adjusted R-squared:  0.4756 
## F-statistic: 4.096e+04 on 5 and 225816 DF,  p-value: < 2.2e-16
coef(lm2)[3:4] 
##  typePBD   typeRR 
## 15.93683 17.07929
coef(lm2)[3:4] + coef(lm2)["after"]
##  typePBD   typeRR 
## 17.98584 19.12830

Survival Function

Question 3

(Survival Functions) Seek to reproduce Figure 3 in Lalive et al. (2006).

Alt text

Answer 3:

line_type=c("dashed", "dashed", "dashed", "dashed","solid", "solid","solid","solid")

kmcurve <- survfit(Surv(dur104, uncc) ~ after, data = udat)

ggsurvplot(kmcurve,
           ggtheme = theme_minimal(),
           linetype = line_type,
           censor.size = 0.1,
           size = 0.5,
           palette = c("#007BB9", "#50677D"),
           legend.labs = c("Before 1989", "After 1989"),
           facet.by = "type") +
  guides(linetype = 'none') +
  labs(
    x = "Unemployment duration (weeks)",
    y = "survival probability",
    subtitle = "Kaplan-Meier survivor functions"
  ) +
  theme_minimal() +
  theme(
    legend.position = "top",
    plot.subtitle = element_text(
      size = 16L,
      hjust = 0.5
    ),
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank()
  ) +

  scale_color_manual(values = c("#007BB9", "#50677D"), labels = c("Before 1989", "After 1989")) +
  facet_wrap(vars(type)) 

KM estimates of the unemployment exit hazard

Question 4

(Survival Functions) Seek to reproduce Figure 4 in Lalive et al. (2006).

Figure 4 from the paper Lalive et al. (2006)
Figure 4 from the paper Lalive et al. (2006)

Answer 4:

library(muhaz)
library(biostat3)
library(bshazard)

hazards <- udat %>%
  group_by(type, after) %>%
  do(as.data.frame.muhaz(muhaz(.$dur104, .$uncc, min.time = 0, max.time = 104, bw.method = "global", b.cor = "none"))) %>%
  ungroup()
plot1=ggplot(hazards) +
  aes(x = x, y = y, group = factor(after)) +
  geom_line(aes(linetype = factor(after), color = factor(after)), linewidth = 0.8) +
  scale_linetype_manual(values = c("dashed", "solid")) +
  scale_color_manual(values = c( "#007BB9", "#50677D"),labels = c("Before 1989", "After 1989")) +
  labs(
    x = "Unemployment duration (weeks)",
    y = "Hazard",
    subtitle = "Kaplan–Meier unemployment exit rates",
    color = element_blank(),
    linetype = element_blank()
  ) +
  theme_minimal() +
  theme(
    legend.position = "top",
    plot.subtitle = element_text(
      size = 16L,
      hjust = 0.5
    ),
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank()
  ) +
  facet_wrap(vars(type)) +
  xlim(0, 100) +
  ylim(0, 0.15)+
  guides(linetype = FALSE)

plot(plot1) 

As we can see, the function presents some jumps and, although it exhibits similar results, it does not fully reflect Figure 4 of the paper, which means that we can still make a small correction: in fact, when we truncate the duration variable dur, creating dur104, the uncensored variable uncc should also be updated, as it must take into account the truncation we are introducing into the data. If we make this correction we obtain:

table_before=as.data.frame(table(udat$uncc))

table_before %>%
  tibble::as_tibble() %>%
  gt::gt() %>%
  gt::tab_header(
    title = "Variable uncc BEFORE updating")
Variable uncc BEFORE updating
Var1 Freq
0 9478
1 216343
udat$uncc=(ifelse(udat$dur>104, 0, udat$uncc))
table_after=as.data.frame(table(udat$uncc))

table_after %>%
  tibble::as_tibble() %>%
  gt::gt() %>%
  gt::tab_header(
    title = "Variable uncc AFTER updating")
Variable uncc AFTER updating
Var1 Freq
0 11866
1 213955

Let’s redo question 4 with this corrected version of the dataset: In this case we obtain a result that is more similar to the one in the paper, because now the variable uncc is updated taking into consideration the truncation made on dur (we are using, in fact, dur104). We can notice that we don’t have anymore the ‘jumps’, but the function is now continuous and well defined across the entire domain.

hazards <- udat %>%
  group_by(type, after) %>%
  do(as.data.frame.muhaz(muhaz(.$dur104, .$uncc, min.time = 0,max.time=104, bw.method = "global", b.cor = "none"))) %>%
  ungroup()
plot2=ggplot(hazards) +
  aes(x = x, y = y, group = factor(after)) +
  geom_line(aes(linetype = factor(after), color = factor(after)), linewidth = 0.8) +
  scale_linetype_manual(values = c("dashed", "solid")) +
  scale_color_manual(values = c( "#007BB9", "#50677D"),labels = c("Before 1989", "After 1989")) +
  labs(
    x = "Unemployment duration (weeks)",
    y = "Hazard",
    #title = "Figure 4",
    subtitle = "Kaplan–Meier unemployment exit rates",
    color = element_blank(),
    linetype = element_blank()
  ) +
  theme_minimal() +
  theme(
    legend.position = "top",
    plot.subtitle = element_text(
      size = 16L,
      hjust = 0.5
    ),
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank()
  ) +
  facet_wrap(vars(type)) +
  xlim(0, 100) +
  ylim(0, 0.15)+
  guides(linetype = FALSE)
  
plot(plot2)

To better visualize the difference between these plots, I compare them with each other:

Estimating the causal treatment effect in a PH model

Question 5

Estimate the causal treatment effect in a PH model.

Answer 5:

udat %>%
  mutate(all = tr * (t39 + t52) ) ->
  udat

breaks <- seq(from=3,to=59, by=4)
labels <- paste("(", c(0,breaks), ",", c(breaks,104), "]",sep="")

gux <- survSplit(Surv(dur104,uncc) ~., data=udat, cut = breaks,
                 end = "time", event="death", start="start", episode="interval")

gux %>%
  mutate(exposure = time - start,
        interval=factor(interval+1, labels = labels) ) ->
  gux
pwe <- coxph(Surv(exposure, death) ~ interval + age + factor(single) + factor(married) + factor(divorced) + factor(sfrau) + factor(f_marr) + factor(f_single) + factor(f_divor) + factor(lehre) + factor(med_educ) + factor(hi_educ) + nwage_pj + factor(bc) + lwage + ten72 + pnon_10 + factor(seasonal) + factor(manuf) + factor(y1988) + factor(y1989) + factor(y1990) + factor(y1991) + factor(q2) + factor(q3) + factor(q4) + interval:tr + interval:t39 + interval:t52 + interval:all + interval:after0 + interval:tr_a0 + interval:t39_a0 + interval:t52_a0 + interval:t39tra0 + interval:t52tra0, data = gux)
library(stargazer)

stargazer(pwe, 
          dep.var.caption="",dep.var.labels="",
          keep=1:15,
          omit.table.layout = "n", star.cutoffs = NA,
          keep.stat=c("n", "ll"),no.space=TRUE,
          header=FALSE,
          title="The PWE model", type="text"
          )
## 
## The PWE model
## ===============================
##                                
## -------------------------------
## interval(3,7]        0.631     
##                     (0.022)    
## interval(7,11]       1.093     
##                     (0.022)    
## interval(11,15]      1.337     
##                     (0.023)    
## interval(15,19]      1.380     
##                     (0.025)    
## interval(19,23]      1.508     
##                     (0.027)    
## interval(23,27]      1.203     
##                     (0.034)    
## interval(27,31]      1.330     
##                     (0.037)    
## interval(31,35]      1.196     
##                     (0.045)    
## interval(35,39]      0.818     
##                     (0.059)    
## interval(39,43]      0.684     
##                     (0.069)    
## interval(43,47]      0.573     
##                     (0.079)    
## interval(47,51]      0.441     
##                     (0.091)    
## interval(51,55]      0.313     
##                     (0.103)    
## interval(55,59]      0.223     
##                     (0.114)    
## interval(59,104]     0.201     
##                     (0.064)    
## -------------------------------
## Observations       1,057,906   
## Log Likelihood   -2,879,250.000
## ===============================

Comparing the coefficients of all the variables (just by commenting ‘keep=1:15’), we can see that they are very similar to those obtained in the paper.

Final remarks

Having reached this point we conclude the work with some final considerations on the work carried out. Replicating an academic paper that features data usage and does not report all the formulas and code used to obtain certain results can be very complicated and treacherous. On the one hand, in fact, there is the problem of understanding in detail what the authors are trying to do, since every slightest difference in thought and therefore in implementation can generate large differences in the final output; on the other hand, you run the risk of making imperfections related to the more practical part of writing the code, and of using different algorithms and models, once again compromising the final result.